GPU vs FPGA: A Comparative Analysis for Non-standard Precision

نویسندگان

  • Umar Ibrahim Minhas
  • Samuel Bayliss
  • George A. Constantinides
چکیده

FPGAs and GPUs are increasingly used in a range of high performance computing applications. When implementing numerical algorithms on either platform, we can choose to represent operands with different levels of accuracy. A trade-off exists between the numerical accuracy of arithmetic operators and the resources needed to implement them. Where algorithmic requirements for numerical stability are captured in a design description, this trade-off can be exploited to optimize performance by using high-accuracy operators only where they are most required. Support for half and double-double floating point representations allows additional flexibility to achieve this. The aim of this work is to study the language and hardware support, and the achievable peak performance for non-standard precisions on a GPU and an FPGA. A compute intensive program, matrix-matrix multiply, is selected as a benchmark and implemented for various different matrix sizes. The results show that for large-enough matrices, GPUs out-perform FPGAbased implementations but for some smaller matrix sizes, specialized FPGA floating-point operators for half and double-double precision can deliver higher throughput than implementation on a GPU.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Accelerated BLAST Performance with Tera-BLASTTM: a comparison of FPGA versus GPU and CPU BLAST implementations

A number of technologies have emerged for accelerating similarity search algorithms in bioinformatics, including the use of field programmable gate arrays (FPGA), graphics processing units (GPU), and clusters of standard multicore CPUs. Here we present Tera-BLASTTM, an FPGA-accelerated implementation of the BLAST algorithm, and compare the performance to GPU-accelerated BLAST and the industry s...

متن کامل

FPGA vs GPU Performance Comparison on the Implementation of FIR Filters

FIR filters find place in digital signal processing applications that require stopping a frequency band while passing another band or removing noise. Due to the complex structure and parallelism property of FIR filters, dedicated reconfigurable hardware are preferred for implementation rather than CPUs. Recently, GPGPU emerged as an effective technique for solving computation-intensive problems...

متن کامل

A Comparative Study on Instrumental Precision of Refrigerated and Non-Refrigerated Auto-Analyzers in Order to Improve Quality Assurance in Biochemistry Laboratory

Background and Objective: Quality control is one of the most important components in order to improve quality assurance in laboratories during analytical steps. For this purpose, coefficient of variation plays an important role. Due to the fast improvement in technology, application of inferential statistics for the comparisons ...

متن کامل

A High Throughput FPGA-Based Implementation of the Lanczos Method for the Symmetric Extremal Eigenvalue Problem

Iterative numerical algorithms with high memory bandwidth requirements but medium-size data sets (matrix size ∼ a few 100s) are highly appropriate for FPGA acceleration. This paper presents a streaming architecture comprising floating-point operators coupled with highbandwidth on-chip memories for the Lanczos method, an iterative algorithm for symmetric eigenvalues computation. We show the Lanc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014